-
-
Notifications
You must be signed in to change notification settings - Fork 131
Add support for "true" cursor based pagination in connections #730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add support for "true" cursor based pagination in connections #730
Conversation
Add support for nested cursor-pagination connections
Reviewer's Guide by SourceryThis pull request introduces Updated class diagram for StrawberryDjangoQuerySetConfigclassDiagram
class StrawberryDjangoQuerySetConfig {
-optimized: bool
-optimized_by_prefetching: bool
-type_get_queryset_did_run: bool
-ordering_descriptors: list[OrderingDescriptor] | None
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @diesieben07 - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a section to the documentation that compares and contrasts
ListConnection
andDjangoCursorConnection
, highlighting when each should be used. - It might be worth adding a note about the implications of strictly ordered QuerySets in the documentation.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟡 Testing: 1 issue found
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
## Cursor based connections | ||
|
||
As an alternative to the default `ListConnection`, `DjangoCursorConnection` is also available. | ||
It supports pagination through a Django `QuerySet` via "true" cursors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Clarify the meaning of "true cursors". Suggest using "offset-based cursors" vs "range-based cursors" to distinguish the approaches.
The term "true cursors" might be confusing to users. Using more descriptive terms like "offset-based cursors" and "range-based cursors" would improve clarity.
Suggested implementation:
It supports pagination through a Django `QuerySet` via range-based cursors.
`ListConnection` uses offset-based cursors (slicing) to achieve pagination, which can negatively affect performance for huge datasets,
return [getattr(obj, descriptor.attname) for descriptor in descriptors] | ||
|
||
|
||
def build_tuple_compare( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider refactoring the complex, nested logic in build_tuple_compare
and apply_cursor_pagination
into smaller, well-named helper functions to improve readability and maintainability by reducing nested code blocks and improving readability..
Consider splitting the complex, nested logic into smaller helper functions. For example, the loop in build_tuple_compare
mixes comparator and equality handling with nested conditionals. You can extract the "compare" logic into a helper:
def _compare_field(descriptor: OrderingDescriptor, field_value: Any, before: bool) -> Q:
value_expr = Value(field_value, output_field=descriptor.order_by.expression.output_field)
comparator = descriptor.get_comparator(value_expr, before)
eq = descriptor.get_eq(value_expr)
if comparator is None:
return eq
return comparator | (eq & Q())
def build_tuple_compare(
descriptors: list[OrderingDescriptor],
cursor_values: list,
before: bool,
) -> Q:
comparators = [
_compare_field(descriptor, value, before)
for descriptor, value in zip(reversed(descriptors), reversed(cursor_values))
]
# Combine comparators with an 'AND' chain
current = Q()
for comp in comparators:
current &= comp
return current
Similarly, consider isolating parts of the slicing logic in apply_cursor_pagination
into a small helper. For example:
def _apply_slice(qs: QuerySet, slice_: slice, related_field_id: Optional[str]) -> QuerySet:
if related_field_id:
offset = slice_.start or 0
return apply_window_pagination(qs, related_field_id=related_field_id, offset=offset, limit=slice_.stop - offset)
return qs[slice_]
# Then in apply_cursor_pagination replace:
if slice_ is not None:
qs = _apply_slice(qs, slice_, related_field_id)
This approach keeps all functionality intact while reducing nested code blocks and improving readability.
…rsor-based-connection
I don't know why the Typing test fails. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #730 +/- ##
==========================================
+ Coverage 88.29% 88.99% +0.70%
==========================================
Files 42 43 +1
Lines 3920 4161 +241
==========================================
+ Hits 3461 3703 +242
+ Misses 459 458 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job here, really appreciate your contributions ❤️
Left a couple of comments/suggestions
get_queryset_config(qs).optimized_by_prefetching = True | ||
return qs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is this change safe? If I remember correctly, the reason we would do this was to mark a prefetch query as optimized, because that config there could be lost
Is that not an issue anymore with the changes here? I sincerely don't remember how it would be lost and even if we have a test that would break without this (hopefully we do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding it is safe, yes. If I change is_optimized_by_prefetching
to just return False
then various tests fail, most importantly this one:
def test_nested_pagination(gql_client: utils.GraphQLTestClient): |
My understanding of why this code was written as it was before my change is as follows:
get_queryset_config
was broken in that it never set the configuration on the QuerySet. This wasn't caught because- There were no tests for it.
- If it fails, that is only catastrophic for
is_optimized_by_prefetching
. The other things in theQuerySet
config can cause the optimizer to optimize aQuerySet
twice or they can cause a type'sget_queryset
to be called twice. Both of these are usually idempotent so don't cause any issues except for wasted CPU cycles.
- The above caused the
get_queryset_config
bug to never surface untilis_optimized_by_prefetching
was introduced and at that point the config disappearing was wrongly attributed to being related toprefetch_related
when in realityget_queryset_config
never worked.
class AttrHelper: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why do we have this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to serialize the values of all order_by
columns into the cursor, which is just a string. The fields could be of any type so we don't want to just str()
them. Django model fields already have built-in serialization to and from strings, which Django uses for serializing and deserializing model instances (mainly for fixtures).
The API however is unfortunately somewhat clunky. Instead of Field.value_to_string(value)
we get Field.value_to_string(obj)
with obj
being the model to look up the field on. So you could do something like
field_value_str = some_model_field.value_to_string(my_model_instance)
This works for fields which are present on the model, but users can order by arbitrary annotations:
qs.alias(name_up=Upper('name')).order_by('name_up')
Now we need to somehow turn the value of that annotation into a string via value_to_string
- but there's no actual field for it. Hence we construct this AttrHelper
and set the attribute that the field is looking for on it. This happens here:
https://github.com/diesieben07/strawberry-graphql-django/blob/39dcdd6e09aaf20c831707d6e01e4bfdacb24028/strawberry_django/relay_cursor.py#L179-L184
We can't use a plain object()
, because you can't set arbitrary attributes on them.
Django's own ArrayField
for PostgreSQL uses a similar technique:
https://github.com/django/django/blob/ab1b9cc1b38e8735979c817f42e8f54a5795c4f7/django/contrib/postgres/fields/array.py#L166-L167
This pull request adds a new connection class
DjangoCursorConnection
which supports efficient cursor-based pagination through any DjangoQuerySet
without relying on offset-slicing.Description
ListConnection
uses slicing to achieve pagination. This works for the general case, but can be inefficient for large datasets, because large page numbers result in a largeOFFSET
in SQL. An alternative to limit/offset pagination is cursor based pagination, which replacesOFFSET
by range queries such asQ(due_date__gt=...)
.DjangoCursorConnection
implements this approach.How it works
DjangoCursorConnection
inspects theQuerySet
and extracts its ordering parameters. It then uses those parameters to construct the cursors. For example, fororder_by("due_date", "pk")
the ordering parameters would bedue_date
andpk
. If the ordering parameter is an expression or not a direct field on the model (e.g.order_by(Upper("name"))
ororder_by("project__name")
, a new annotation will be added to the queryset, mirroring the ordering expression, so that the value can be extracted later.The extracted values are then encoded into a cursor.
When paginating, the cursor is deconstructed into its parts again and those parts are then used to build a pagination filter.
For example, when ordering by
"due_date", "pk"
, the cursor might contain the parts"2025-03-01"
and,"3"
. If that cursor is passed forafter
, the following filter would be constructed:Q(due_date__gt="2025-03-01") | (Q(due_date="2025-03-01") & Q(pk__gt="3"))
Serializing and deserializing the field values to strings is dedicated to the model field implementation, ensuring maximal compatibility.
Other
During the implementation I discovered a bug in
get_queryset_config
. It is used in the code as if it did a "get or create" operation, setting the config on the QuerySet if not already present. However it did not actually do so.This is likely the cause for why this hack was implemented.
While writing tests I have noticed that
strawberry_django.field(disable_optimization=True)
had no effect when used on a top level field. I have fixed this.Currently, the code lives in a separate
relay_cursor.py
file. I have chosen to do this for now to make the diff of this PR easier to parse. However I think the relay code should be refactored so thatrelay.py
is removed and we haverelay/__init__.py
instead. Then the code can be split up into multiple files and still offer the same imports. What do you think?I have fixed
get_queryset_config
, added specific tests for it and removed the (now unnecessary) hack inis_optimized_by_prefetching
.Types of Changes
Checklist